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We describe the implementation of methods of nonlinear time series analysis which are based 
on the paradigm of deterministic chaos. A variety of algorithms for data representation, prediction, 
noise reduction, dimension and Lyapunov estimation, and nonlinearity testing are discussed with 
particular emphasis on issues of implementation and choice of parameters. Computer programs that 
implement the resulting strategies are publicly available as the TISEAN software package. The use 
of each algorithm will be illustrated with a typical application. As to the theoretical background, 
we will essentially give pointers to the literature. 



LEAD PARAGRAPH 

Nonlinear time series analysis is becoming a more and more reliable tool for the study of complicated 
dynamics from measurements. The concept of low-dimensional chaos has proven to be fruitful in the 
understanding of many complex phenomena despite the fact that very few natural systems have actu- 
ally been found to be low dimensional deterministic in the sense of the theory. In order to evaluate the 
long term usefulness of the nonlinear time series approach as inspired by chaos theory it will be impor- 
tant that the corresponding methods become more widely accessible. This paper, while not a proper 
review on nonlinear time series analysis, tries to make a contribution to this process by describing the 
actual implementation of the algorithms, and their proper usage. Most of the methods require the choice 
of certain parameters for each specific time series application. We will try to give guidance in this re- 
spect. The scope and selection of topics in this article, as well as the implementational choices that have 
been made, correspond to the contents of the software package TISEAN which is publicly available from 
http: //www. mpipks-dresden. mpg.de/ ~ tisean. In fact, this paper can be seen as an extended manual 
for the TISEAN programs. It fills the gap between the technical documentation and the existing literature, 
providing the necessary entry points for a more thorough study of the theoretical background. 
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I. INTRODUCTION 

Deterministic chaos as a fundamental concept is by 
now well established and described in a rich literature. 
The mere fact that simple deterministic systems generi- 
cally exhibit complicated temporal behavior in the pres- 
ence of nonlinearity has influenced thinking and intuition 
in many fields. However, it has been questioned whether 
the relevance of chaos for the understanding of the time 
evolving world goes beyond that of a purely philosoph- 
ical paradigm. Accordingly major research efforts are 
dedicated to two related questions. The first question is 
if chaos theory can be used to gain a better understand- 
ing and interpretation of observed complex dynamical 
behavior. The second is if chaos theory can give an ad- 
vantage in predicting or controlling such time evolution. 
Time evolution as a system property can be measured by 
recording time series. Thus, nonlinear time series meth- 
ods will be the key to the answers of the above questions. 
This paper is intended to encourage the explorative use 
of such methods by a section of the scientific community 
which is not limited to chaos theorists. A range of algo- 
rithms has been made available in the form of computer 
programs by the TISEAN project [1]. Since this is fairly 
new territory, unguided use of the algorithms bears con- 
siderable risk of wrong interpretation and unintelligible 
or spurious results. In the present paper, the essential 
ideas behind the algorithms are summarized and point- 
ers to the existing literature are given. To avoid exces- 
sive redundancy with the text book [2] and the recent 
review [3], the derivation of the methods will be kept to 
a minimum. On the other hand, the choices that have 
been made in the implementation of the programs are 
discussed more thoroughly, even if this may seem quite 
technical at times. We will also point to possible alter- 
natives to the TISEAN implementation. 

Let us at this point mention a number of general ref- 
erences on the subject of nonlinear dynamics. At an 
introductory level, the book by Kaplan and Glass [4] 
is aimed at an interdisciplinary audience and provides 
a good intuitive understanding of the fundamentals of 
dynamics. The theoretical framework is thoroughly de- 
scribed by Ott [5], but also in the older books by Bcrge 
et al. [6] and by Schuster [7]. More advanced material is 
contained in the work by Katok and Hasselblatt [8]. A 
collection of research articles compiled by Ott et al. [9] 
covers some of the more applied aspects of chaos, like 
synchronization, control, and time series analysis. 

Nonlinear time series analysis based on this theoreti- 
cal paradigm is described in two recent monographs, one 
by Abarbanel [10] and one by Kantz and Schrciber [2]. 
While the former volume usually assumes chaoticity, the 
latter book puts some emphasis on practical applications 
to time series that are not manifestly found, nor simply 
assumed to be, deterministic chaotic. This is the ratio- 
nale we will also adopt in the present paper. A num- 
ber of older articles can be seen as reviews, including 



Grassberger et al. [11], Abarbanel et al. [12], as well as 
Kugiumtzis et al. [13,14]. The application of nonlinear 
time series analysis to real world measurements where 
determinism is unlikely to be present in a stronger sense, 
is reviewed in Schrciber [3]. Apart from these works, a 
number of conference proceedings volumes are devoted 
to chaotic time series, including Refs. [15-19]. 



A. Philosophy of the TISEAN implementation 

A number of different people have been credited for 
the saying that every complicated question has a simple 
answer which is wrong. Analysing a time series with a 
nonlinear approach is definitely a complicated problem. 
Simple answers have been repeatedly offered in the liter- 
ature, quoting numerical values for attractor dimensions 
for any conceivable system. The present implementation 
reflects our scepticism against such simple answers which 
are the inevitable result of using black box algorithms. 
Thus, for example, none of the "dimension" programs 
will actually print a number which can be quoted as the 
estimated attractor dimension. Instead, the correlation 
sum is computed and basic tools are provided for its inter- 
pretation. It is up to the scientist who does the analysis 
to put these results into their proper context and to infer 
what information she or he may find useful and plausi- 
ble. We should stress that this is not simply a question of 
error bars. Error bars don't tell about systematic errors 
and neither do they tell if the underlying assumptions 
are justified. 

The TISEAN project has emerged from work of the 
involved research groups over several years. Some of the 
programs are in fact based on code published in Ref. [2]. 
Nevertheless, we still like to see it as a starting point 
rather than a conclusive step. First of all, nonlinear time 
series analysis is still a rapidly evolving field, in partic- 
ular with respect to applications. This implies that the 
selection of topics in this article and the selection of al- 
gorithms implemented in TISEAN are highly biased to- 
wards what we know now and found useful so far. But 
even the well established concepts like dimension esti- 
mation and noise reduction leave considerable room for 
alternatives to the present implementation. Sometimes 
this resulted in two or more concurring and almost re- 
dundant programs entering the package. We have delib- 
erately not eliminated these redundancies since the user 
may benefit from having a choice. In any case it is healthy 
to know that for most of the algorithms the final word 
hasn't been spoken yet - nor is ever to be. 

While the TISEAN package does contain a number of 
tools for linear time series analysis (spectrum, autocor- 
relations, histograms, etc.), these are only suitable for a 
quick inspection of the data. Spectral or even ARMA 
estimation are industries in themselves and we refer the 
reader - and the user of TISEAN - to the existing litera- 
ture and available statistics software for optimal, up-to- 
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date implementations of these important methods. 

Some users will miss a convenient graphical interface 
to the programs. We felt that at this point the extra 
implcmcntational effort would not be justified by the ex- 
pected additional functionality of the package. Work is 
in progress, however, to provide interfaces to higher level 
mathematics (or statistics) software. 



B. General computational issues 

The natural basis to formulate nonlinear time series al- 
gorithms from chaos theory is a multi-dimensional phase 
space, rather than the time or the frequency domain. It 
will be essential for the global dynamics in this phase 
space to be nonlinear in order to fulfill the constraints 
of non-triviality and boundedness. Only in particular 
cases this nonlinear structure will be easily representable 
by a global nonlinear function. Instead, all properties 
will be expressed in terms of local quantities, often by 
suitable global averages. All local information will be 
gained from neighborhood relations of various kinds from 
time series elements. Thus a recurrent computational is- 
sue will be that of defining local neighborhoods in phase 
space. Finding neighbors in multidimensional space is a 
common problem of computational geometry. Multidi- 
mensional tree structures are widely used and have at- 
tractive theoretical properties. Finding all neighbors in 
a set of N vectors takes O(logiV) operations, thus the 
total operation count is 0(N log N). A fast alternative 
that is particularly efficient for relatively low dimensional 
structures embedded in multidimensional spaces is given 
by box-assisted neighbor search methods which can push 
the operation count down to O(N) under certain assump- 
tions. Both approaches are reviewed in Ref. [20] with 
particular emphasis on time series applications. In the 
TISEAN project, fast neighbor search is done using a 
box-assisted approach, as described in Ref. [2]. 

No matter in what space dimension we are working, we 
will define candidates for nearest neighbors in two dimen- 
sions using a grid of evenly spaced boxes. With a grid of 
spacing e, all neighbors of a vector x closer than epsilon 
must be located in the adjacent boxes. But not all points 
in the adjacent boxes are neighbors, they may be up to 
2e away in two dimensions and arbitrarily far in higher 
dimensions. Neighbors search is thus a two stage pro- 
cess. First the box-assisted data base has to be filled and 
then for each point a list of neighbors can be requested. 
There few instances where it is advisable to aban- 

don the fast neighbor search strategy. One example is 
the program noise that does nonlinear noise filtering in 
a data stream. It is supposed to start filtering soon after 
the first points have been recorded. Thus the neighbor 
data base cannot be constructed in the beginning. An- 
other exception is if quite short (< 500 points, say), high 
dimensional data are processed. Then the overhead for 
the neighbor search should be avoided and instead an op- 



timized straight 0(N 2 ) method be used, like it is done 
in c2naive. 

For portability, all programs expect time series data 
in column format represented by ASCII numbers. The 
column to be processed can be specified on the com- 
mand line. Although somewhat wasteful for storing data, 
ASCII is the least common divisor between the different 
ways most software can store data. All parameters can be 
set by adding options to the command, which, in many 
programs, just replace the default values. Obviously, re- 
lying on default settings is particularly dangerous in such 
a subtle field. Since almost all routines can read from 
standard input and write to standard output, programs 
can be part of pipelines. For example they can be called 
as filters from inside graphics software or other software 
tools which are able to execute shell commands. Also, 
data conversion or compression can be done "on the fly" 
this way. The reader here realizes that we are speak- 
ing of UNIX or LINUX platforms which seems to be the 
most appropriate environment. It is however expected 
that most of the programs will be ported to other envi- 
ronments in the near future. 

For those readers familiar with the programs published 
in Ref. [2] we should mention that these form the basis 
for a number of those TISEAN programs written in FOR- 
TRAN. The C programs, even if they do similar things, 
are fairly independent implementations. All C and C++ 
programs now use dynamic allocation of storage, for ex- 
ample. 

II. PHASE SPACE REPRESENTATION 

Deterministic dynamical systems describe the time 
evolution of a system in some phase space T C R d . 
They can be expressed for example by ordinary differ- 
ential equations 

x(t)=F(x(t)), (1) 

or in discrete time t — nAt by maps of the form 

x n+ i=f(x n ). (2) 

A time series can then be thought of as a sequence of ob- 
servations {s n — s(x„)} performed with some measure- 
ment function s(-). Since the (usually scalar) sequence 
{s n } in itself does not properly represent the (multidi- 
mensional) phase space of the dynamical system, one has 
to employ some technique to unfold the multidimensional 
structure using the available data. 

A. Delay coordinates 

The most important phase space reconstruction tech- 
nique is the method of delays. Vectors in a new space, the 
embedding space, are formed from time delayed values of 
the scalar measurements: 
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FIG. 1. Time delay representation of a human mag- 
neto-cardiogram. In the upper panel, a short delay time of 
10 ms is used to resolve the fast waveform corresponding to 
the contraction of the ventricle. In the lower panel, the slower 
recovery phase of the ventricle (small loop) is better resolved 
due to the use of a slightly longer delay of 40 ms. Such a plot 
can be conveniently be produced by a graphic tool such as 
gmiplot without generating extra data files. 



s n — ( s n — (m— 1)t; s n-(m-2)Ti • • • ? s n) ■ 



(3) 



The number m of elements is called the embedding di- 
mension, the time r is generally referred to as the delay 
or lag. Celebrated embedding theorems by Takens [21] 
and by Sauer et al. [22] state that if the sequence {s n } 
does indeed consist of scalar measurements of the state 
of a dynamical system, then under certain genericity as- 
sumptions, the time delay embedding provides a one-to- 
one image of the original set {x}, provided m is large 
enough. 

Time delay embeddings are used in almost all methods 
described in this paper. The implementation is straight- 
forward and does not require further explanation. If N 
scalar measurements are available, the number of embed- 
ding vectors is only N — (m — 1)t. This has to be kept in 
mind for the correct normalization of averaged quantities. 
There is a large literature on the "optimal" choice of the 
embedding parameters m and r. It turns out, however, 
that what constitutes the optimal choice largely depends 



on the application. We will therefore discuss the choice of 
embedding parameters occasionally together with other 
algorithms below. 

A stand-alone version of the delay procedure (delay, 
embed) is an important tool for the visual inspection 
of data, even though visualization is restricted to two 
dimensions, or at most two-dimensional projections of 
three-dimensional renderings. A good unfolding already 
in two dimensions may give some guidance about a good 
choice of the delay time for higher dimensional embed- 
dings. As an example let us show two different two- 
dimensional delay coordinate representations of a human 
magneto-cardiogram (Fig. 1). Note that we do neither 
assume nor claim that the magneto- (or electro-) cardio- 
gram is deterministic or even chaotic. Although in the 
particular case of cardiac recordings the use of time delay 
embeddings can be motivated theoretically [23], we here 
only want to use the embedding technique as a visualiza- 
tion tool. 



B. Embedding parameters 

A reasonable choice of the delay gains importance 
through the fact that we always have to deal with a finite 
amount of noisy data. Both noise and finiteness will pre- 
vent us from having access to infinitesimal length scales, 
so that the structure we want to exploit should persists 
up to the largest possible length scales. Depending on the 
type of structure we want to explore we have to choose 
a suitable time delay. Most obviously, delay unity for 
highly sampled flow data will yield delay vectors which 
are all concentrated around the diagonal in the embed- 
ding space and thus all structure perpendicular to the di- 
agonal is almost invisible. In [24] the terms redundancy 
and irrelevance were used to characterize the problem: 
Small delays yield strongly correlated vector elements, 
large delays lead to vectors whose components are (al- 
most) uncorrelated and the data are thus (seemingly) 
randomly distributed in the embedding space. Quite 
a number of papers have been published on the proper 
choice of the delay and embedding dimension. We have 
argued repeatedly [11,2,3] that an "optimal" embedding 
can - if at all - only be defined relative to a specific 
purpose for which the embedding is used. Nevertheless, 
some quantitative tools are available to guide the choice. 

The usual autocorrelation function (autocor, corr) 
and the time delayed mutual information (mutual), as 
well as visual inspection of delay representations with 
various lags provide important information about rea- 
sonable delay times while the false neighbors statistic 
(f alse_riearest) can give guidance about the proper em- 
bedding dimension. Again, "optimal" parameters cannot 
be thus established except in the context of a specific ap- 
plication. 
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FIG. 2. The fraction of false nearest neighbors as a function 
of the embedding dimension for noise free Lorenz (crosses) 
and Henon (filled circles) time series, as well as a Henon time 
series (open circles) corrupted by 10% of noise. 

1. Mutual information 

The time delayed mutual information was suggested 
by Fraser and Swinney [25] as a tool to determine a rea- 
sonable delay: Unlike the autocorrelation function, the 
mutual information takes into account also nonlinear cor- 
relations. One has to compute 

S=-£^(r)ln^, (4) 

where for some partition on the real numbers p, is the 
probability to find a time series value in the z-th interval, 
and Pij{r) is the joint probability that an observation 
falls into the z-th interval and the observation time r 
later falls into the j-th. In theory this expression has 
no systematic dependence on the size of the partition el- 
ements and can be quite easily computed. There exist 
good arguments that if the time delayed mutual infor- 
mation exhibits a marked minimum at a certain value of 
t, then this is a good candidate for a reasonable time 
delay. However, these arguments have to be modified 
when the embedding dimension exceeds two. Moreover, 
as will become transparent in the following sections, not 
all applications work optimally with the same delay. Our 
routine mutual uses Eq.(4), where the number of boxes 
of identical size and the maximal delay time has to be 
supplied. The adaptive algorithm used in [25] is more 
data intensive. Since we are not really interested in ab- 
solute values of the mutual information here but rather 
in the first minimum, the minimal implementation given 
here seems to be sufficient. The related generalized mu- 
tual information of order two can be defined using the 
correlation sum concept (Sec. VII, [26,27]). Estimation of 
the correlation entropy is explained in Sec. VII A. 



2. False nearest neighbors 

A method to determine the minimal sufficient embed- 
ding dimension m was proposed by Kennel et al. [28]. 
It is called the false nearest neighbor method. The idea 
is quite intuitive. Suppose the minimal embedding di- 
mension for a given time series {si} is mo- This means 
that in a mo-dimensional delay space the reconstructed 
attractor is a one-to-one image of the attractor in the 
original phase space. Especially, the topological proper- 
ties are preserved. Thus the neighbors of a given point 
are mapped onto neighbors in the delay space. Due to 
the assumed smoothness of the dynamics, neighborhoods 
of the points are mapped onto neighborhoods again. Of 
course the shape and the diameter of the neighborhoods 
is changed according to the Lyapunov exponents. But 
suppose now you embed in an m-dimensional space with 
m < m . Due to this projection the topological struc- 
ture is no longer preserved. Points are projected into 
neighborhoods of other points to which they wouldn't be- 
long in higher dimensions. These points are called false 
neighbors. If now the dynamics is applied, these false 
neighbors are not typically mapped into the image of the 
neighborhood, but somewhere else, so that the average 
"diameter" becomes quite large. 

The idea of the algorithm f alse_nearest is the fol- 
lowing. For each point s, in the time series look for its 
nearest neighbor Sj in a m-dimensional space. Calculate 
the distance — Sj\\. Iterate both points and compute 

If Ri exceeds a given heuristic threshold R t , this point 
is marked as having a false nearest neighbor [28]. The 
criterion that the embedding dimension is high enough 
is that the fraction of points for which Ri > R t is zero, 
or at least sufficiently small. Two examples are shown in 
Fig. 2. One is for the Lorenz system (crosses), one for 
the Henon system (filled circles), and one for a Henon 
time series corrupted by 10% of Gaussian white noise 
(open circles). One clearly sees that, as expected, m = 2 
is sufficient for the Henon and m = 3 for the Lorenz 
system, whereas the signature is less clear in the noisy 
case. 

The introduction of the false nearest neighbors con- 
cept and other ad hoc instruments was partly a reaction 
to the finding that many results obtained for the genuine 
invariants, like the correlation dimension, has been spu- 
rious due to caveats of the estimation procedure. In the 
latter case, serial correlations and small sample fluctua- 
tions can easily be mistaken for nonlinear determinism. 
It turns out, however, that the ad hoc quantities basically 
suffer from the same problems - which can be cured by the 
same precautions. The implementation f alsejiearest 
therefore allows to specify a minimal temporal separation 
of valid neighbors. 
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FIG. 3. Phase space representation of a human mag- 
neto-cardiogram using the two largest principal components. 
An initial embedding was chosen in m = 20 dimensions with 
a delay of r = 7 ms. The two components cover 70% of the 
variance of the initial embedding vectors. 

Other software for the analysis of false nearest neigh- 
bors is available in source form from Kennel [29]. Or, if 
you prefer to pay for a license, from Ref. [30]. 



C. Principal components 

It has been shown in Ref. [22] that the embedding 
technique can be generalized to a wide class of smooth 
transformations applied to a time delay embedding. In 
particular, if we introduce time delay coordinates {s„}, 
then almost every linear transformation of sufficient rank 
again leads to an embedding. A specific choice of linear 
transformation is known as principal component analy- 
sis, singular value decomposition, empirical orthogonal 
functions, Karhunen-Loeve decomposition, and probably 
a few other names. The technique is fairly widely used, 
for example to reduce multivariate data to a few major 
modes. There is a large literature, including textbooks 
like that by Jolliffe [31]. In the context of nonlinear sig- 
nal processing, the technique has been advocated among 
others by Broomhead and King [32]. 

The idea is to introduce a new set of orthonormal basis 
vectors in embedding space such that projections onto a 
given number of these directions preserve the maximal 
fraction of the variance of the original vectors. In other 
words, the error in making the projection is minimized for 
a given number of directions. Solving this minimization 
problem [31] leads to an eigenvalue problem. The desired 
principal directions can be obtained as the eigenvectors 
of the symmetric autocovariance matrix that correspond 
to the largest eigenvalues. The alternative and formally 
equivalent approach via the trajectory matrix is used in 
Ref. [32]. The latter is numerically more stable but in- 
volves the singular value decomposition of an N x m ma- 
trix for TV data points embedded in m dimensions, which 
can easily exceed computational resources for time series 



of even moderate length [33] . 

In almost all the algorithms described below, simple 
time delay embeddings can be substituted by principal 
components. In the TISEAN project (routines svd, pc), 
principal components are only provided as a stand-alone 
visualization tool and for linear filtering [34], see Sec. HE 
below. In any case, one first has to choose an initial time 
delay embedding and then a number of principal compo- 
nents to be kept. For the purpose of visualization, the 
latter is immediately restricted to two or at most three. 
In order to take advantage of the noise averaging effect of 
the principal component scheme, it is advisable to choose 
a much shorter delay than one would for an ordinary 
time delay embedding, while at the same time increasing 
the embedding dimension. Experimentation is recom- 
mended. Figure 3 shows the contributions of the first two 
principal components to the magneto-cardiogram shown 
in Fig. 1. 



D. Poincare sections 

Highly sampled data representing the continuous time 
of a differential equation are called flow data. They are 
characterized by the fact that errors in the direction tan- 
gent to the trajectory do neither shrink nor increase ex- 
ponentially (so called marginally stable direction) and 
thus possess one Lyapunov exponent which is zero, since 
any perturbation in this direction can be compensated by 
a simple shift of the time. Since in many data analysis 
tasks this direction is of low interest, one might wish to 
eliminate it. The theoretical concept to do so is called 
the Poincare section. After having chosen an (to — 1)- 
dimcnsional hyperplane in the TO-dimensional (embed- 
ding) space, one creates a compressed time series of only 
the intersections of the time continuous trajectory with 
this hyperplane in a predefined orientation. These data 
are then vector valued discrete time map like data. One 
can consider the projection of these (to — l)-dimensional 
vectors onto the real numbers as another measurement 
function (e.g. by recording the value of s n when s„ passes 
the Poincare surface), so that one can create a new scalar 
time scries if desirable. The program poincare con- 
structs a sequence of vectors from a scalar flow-like data 
set, if one specifies the hyperplane, the orientation, and 
the embedding parameters. The intersections of the dis- 
cretely sampled trajectory with the Poincare plane are 
computed by a third order interpolation. 

The placement of the Poincare surface is of high rele- 
vance for the usefulness of the result. An optimal surface 
maximizes the number of intersections, i.e. minimizes the 
time intervals between them, if at the same time the at- 
tractor remains connected. One avoids the trials and 
errors related to that if one defines a surface by the zero 
crossing of the temporal derivative of the signal, which 
is synonymous with collecting all maxima or all min- 
ima, respectively. This is done by extrema. However, 
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FIG. 4. Poincare surface of section using extrema: A 
two-dimensional delay plot of the sequence of maxima (top) 
and of the time intervals between successive maxima (bot- 
tom), without employing the option -t time, where time is 
the number of time steps after the last extremum during which 
no further extrema are searched for (here: 3), one finds some 
fake extrema due to noise showing up close to the diagonal 
of the delay representation. Data: Time series of the output 
power of a CO2 laser [35]. 



E. SVD filters 



There are at least two reasons to apply an SVD filter to 
time series data: Either, if one is working with flow data, 
one can implicitly determine the optimal time delay, or, 
when deriving a stroboscopic map from synchronously 
sampled data of a periodically driven system, one might 
use the redundancy to optimize the signal to noise ratio. 

In both applications the mathematics is the same: One 
constructs the covariance matrix of all data vectors (e.g. 
in an m-dimensional time delay embedding space), 
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and computes its singular vectors. Then one projects 
onto the m-dimensional vectors corresponding to the q 
largest singular values. To work with flow data, q should 
be at least the correct embedding dimension, and m con- 
siderably larger (e.g. m — 2q or larger). The result is a 
vector valued time series, and in [22] the relation of these 
components to temporal derivatives on the one hand and 
to Fourier components on the other hand were discussed. 
If, in the non-autonomous case, one wants to compress 
flow data to map data, q = 1. In this case, the redun- 
dancy of the flow is implicitly used for noise reduction 
of the map data. The routine svd can be used for both 
purposes. 



III. VISUALIZATION, NON-STATIONARITY 



A. Recurrence plots 



this method suffers more from noise, since for small time 
derivatives (i.e. close to the extrema) additional extrema 
can be produced by perturbations. Another aspect for 
the choice of the surface of section is that one should 
try to maximize the variance of the data inside the sec- 
tion, since their absolute noise level is independent of 
the section. One last remark: Time intervals between 
intersections are phase space observables as well [36] and 
the embedding theorems are thus valid. For time series 
with pronounced spikes, one often likes to study the se- 
quence of interspike time intervals, e.g. in cardiology the 
RR-intcrvals. If these time intervals are constructed in a 
way to yield time intervals of a Poincare map, they are 
suited to reflect the deterministic structure (if any). For 
complications see [36]. 

For a periodically driven non- autonomous system the 
best surface of section is usually given by a fixed phase 
of the driving term, which is also called a stroboscopic 
view. Here again the selection of the phase should be 



Recurrence plots are a useful tool to identify structure 
in a data set in a time resolved way qualitatively. This 
can be intermittency (which one detects also by direct 
inspection), the temporary vicinity of a chaotic trajectory 
to an unstable periodic orbit, or non-stationarity. They 
were introduced in [37] and investigated in much detail 
in [38], where you find many hints on how to interprete 
the results. Our routine recurr simply scans the time 
series and marks each pair of time indices with a 

black dot, whose corresponding pair of delay vectors has 
distance < e. Thus in the (i, j)-plane, black dots indicate 
closeness. In an ergodic situation, the dots should cover 
the plane uniformly on average, whereas non-stationarity 
expresses itself by an overall tendency of the dots to be 
close to the diagonal. Of course, a return to a dynamical 
situation the system was in before becomes evident by 
a black region far away from the diagonal. In Fig. 5, a 
recurrence plot is used to detect transient behavior at the 
beginning of a longer recording. 
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FIG. 5. Recurrence plot for Poincare section data from a 
vibrating string experiment [39]. Above the diagonal an em- 
bedding in two dimensions was used while below the diago- 
nal, scalar time series values were compared. In both cases 
the lighter shaded region at the beginning of the recording 
indicates that these data are dynamically distinct from the 
rest. In this particular case this was due to adjustments in 
the measurement apparatus. 

For the purpose of stationary testing, the recurrence 
plot is not particularly sensitive to the choice of embed- 
ding. The contrast of the resulting images can be se- 
lected by the distance e and the percentage of dots that 
should be actually plotted. Various software involving 
the color rendering and quantification of recurrence plots 
is offered in DOS executable form by Webber [40]. The 
interpretation of the often intriguing patterns beyond the 
detection and study of non-stationarity is still an open 
question. For suggestions for the study of nonstationary 
signals see [3] and references given there. 
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FIG. 6. Space-time separation plot of the CO2 laser data. 
Shown are lines of constant probability density of a point to 
be e-neighbor of the current point if its temporal distance is 
St. Probability densitites are 1/10 to 1 with increments of 
1/10 from bottom to top. Clear correlations are visible. 

package is stp, see Fig. 6. 



IV. NONLINEAR PREDICTION 

To think about predictability in time series data is 
worth while even if one is not interested in forecasts at 
all. Predictability is one way how correlations between 
data express themselves. These can be linear correla- 
tions, nonlinear correlations, or even deterministic con- 
traints. Questions related to those relevant for predic- 
tions will reappear with noise reduction and in surrogate 
data tests, but also for the computation of Lyapunov ex- 
ponents from data. Prediction is discussed in most of the 
general nonlinear time series references, in particular, a 
nice collection of articles can be found in [17]. 



A. Model validation 



B. Space-time separation plot 

While the recurrence plot shows absolute times, the 
space-time separation plot introduced by Provenzale et 
al. [41] integrates along parallels to the diagonal and thus 
only shows relative times. One usually draws lines of 
constant probability per time unit of a point to be an 
e-neighbor of the current point, when its time distance is 
St. This helps identifying temporal correlations inside the 
time series and is relevant to estimate a reasonable delay 
time, and, more importantly, the Theiler-window w in 
dimension and Lyapunov-analysis (see Sec. VII). Said in 
different words, it shows how large the temporal distance 
between points should be so that we can assume that 
they form independent samples according to the invari- 
ant measure. The corresponding routine of the TISEAN 



Before entering the methods, we have to discuss how 
to assess the results. The most obvious quantity for the 
quantification of predictability is the average forecast er- 
ror, i.e. the root of the mean squared (rms) deviation of 
the individual prediction from the actual future value. If 
it is computed on those values which were also used to 
construct the model (or to perform the predictions), it is 
called the in-sample error. It is always advisable to save 
some data for an out-of-sample test. If the out-of-samplc 
error is considerably larger than the in-sample error, data 
are either non-stationary or one has overfitted the data, 
i.e. the fit extracted structure from random fluctuations. 
A model with less parameters will then serve better. In 
cases where the data base is poor, on can apply complete 
cross-validation or take-one-out statistics, i.e. one con- 
structs as many models as one performs forecasts, and in 
each case ignores the point one wants to predict. By con- 
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FIG. 7. Predictions k time steps ahead (no iterated pre- 
dictions) using the program zeroth. Top curve: embed- 
ding dimension two is insufficient, since these flow data fill 
a (2+e)-dimensional attractor. Second from top: Although 
embedding dimension four should in theory be a good em- 
bedding, t = 1 suppresses structure perpendicular to the di- 
agonal so that the predictions are as bad as in m = 2! Lower 
curves: m = 3 and 4 with a delay of about 4-8 time units 
serve well. 



struction, this method is realized in the local approaches, 
but not in the global ones. 

The most significant, but least quantitative way of 
model validation is to iterate the model and to compare 
this synthetic time series to the experimental data. If 
they are compatible (e.g. in a delay plot), then the model 
is likely to be reasonable. Quantitatively, it is not easy to 
define the compatibility. One starts form an observed de- 
lay vector as intial condition, performs the first forecast, 
combines the forecast with all but the last components 
of the initial vector to a new delay vector, performs the 
next forecast, and so on. The resulting time series should 
then be compared to the measured data, most easily the 
attractor in a delay representation. 



B. Simple nonlinear prediction 

Conventional linear prediction schemes average over all 
locations in phase space when they extract the correla- 
tions they exploit for predictability. Tong [42] promoted 
an extension that fits different linear models if the cur- 
rent state is below or above a given threshold (TAR, 
Threshold Autoregressive Model). If we expect more 
than a slight nonlinear component to be present, it is 
preferable to make the approximation as local in phase 
space as possible. There have been many similar sugges- 
tions in the literature how to exploit local structure, see 
e.g. [43-46]. The simplest approach is to make the ap- 
proximation local but only keep the zeroth order, that is, 
approximate the dynamics locally by a constant. In the 
TISEAN package we include such a robust and simple 
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method: In a delay embedding space, all neighbors of s„ 
are saught, if we want to predict the measurements at 
time n + k. The forecast is then simply 



1 



Sj+k ■ 



(7) 



i.e. the average over the "futures" of the neighbors. The 
average forecast errors obtained with the routine zeroth 
(predict would give similar results) for the laser output 
data used in Fig. 4 as a function of the number k of steps 
ahead the predictions are made is shown in Fig. 7. One 
can also iterate the predictions by using the time series 
as a data base. 

Apart from the embedding parameters, all that has 
to be specified for zeroth order predictions is the size of 
the neighborhoods. Since the diffusive motion below the 
noise level cannot be predicted anyway, it makes sense 
to select neighborhoods which are at least as large as the 
noise level, maybe two or three times larger. For fairly 
clean time series, this guideline may result in neighbor- 
hoods with very few points. Therefore zeroth also per- 
mits to specify the minimal number of neighbors to base 
the predictions on. 

A relevant modification of this method is to extend the 
neighborhood U to infinity, but to introduce a distance 
dependent weight, 



T, j7 in W (\ S n 



(8) 



where w is called the kernel. For w(z) — 0(e — z) where 
O is the Heaviside step function, we return to Eq.(7). 
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C. Finding unstable periodic orbits 

As an application of simple nonlinear phase space pre- 
diction, let us discuss a method to locate unstable peri- 
odic orbits embedded in a chaotic attractor. This is not 
the place to review the existing methods to solve this 
problem, some references include [47-50]. The TISEAN 
package contains a routine that implements the require- 
ment that for a period p orbit {s„,n = 1, . . . ,p} of a 
dynamical system like Eq.(2) acting on delay vectors 

s„+i=f(s„), n=l,...,p, s p+ i=si. (9) 

With unit delay, the p delay vectors contain p different 
scalar entries, and Eq.(9) defines a root of a system of p 
nonlinear equations in p dimensions. Multidimensional 
root finding is not a simple problem. The standard New- 
ton method has to be augmented by special tricks in or- 
der to converge globally. Some such tricks, in particular 
means to select different solutions of Eq.(9) arc imple- 
mented in [50] . Similarly to the problems encountered in 
nonlinear noise reduction, solving Eq.(9) exactly is par- 
ticularly problematic since f(-) is unknown and must be 
estimated from the data. In Ref. [49], approximate so- 
lutions are found by performing just one iteration of the 
Newton method for each available time series point. We 
prefer to look for a least squares solution by minimizing 

p 

^ ||s„ + i - f(s„)|| 2 , s p+ i=§i (10) 

n=l 

instead. The routine upo uses a standard Levenberg- 
Marquardt algorithm to minimize (10). For this it is 
necessary that f(-) is smooth. Therefore we cannot use 
the simple nonlinear predictor based on locally constant 
approximations and we have to use a smooth kernel ver- 
sion, Eq.(8), instead. With w{z) = exp(— z 2 / 2h 2 ) , the 
kernel bandwidth h determines the degree of smoothness 
of f(-). Trying to start the minimization with all avail- 
able time series segments will produce a number of false 
minima, depending on the value of h. These have to be 
distinguished from the true solutions by inspection. On 
the other hand, we can reach solutions of Eq.(9) which 
are not closely visited in the time series at all, an impor- 
tant advantage over close return methods [47]. 

It should be noted that, depending on ft,, we may al- 
ways find good minima of (8), even if no solution of 
Eq.(9), or not even a truly deterministic dynamics ex- 
ists. Thus the finding of unstable periodic orbits in itself 
is not a strong indicator of determinism. We may how- 
ever use the cycle locations or stabilities as a discrimi- 
nating statistics in a test for nonlinearity, see Sec. VIII. 
While the orbits themselves are found quite easily, it is 
surprisingly difficult to obtain reliable estimates of their 
stability in the presence of noise. In upo, a small pertur- 
bation is iterated along the orbit and the unstable eigen- 
value is determined by the rate of its separation from the 
periodic orbit. 



The user of upo has to specify the embedding dimen- 
sion, the period (which may also be smaller) and the 
kernel bandwidth. For efficiency, one may choose to skip 
trials with very similar points. Orbits are counted as dis- 
tinct only when they differ by a specified amount. The 
routine finds the orbits, their expanding eigenvalue, and 
possible sub-periods. Figure 8 shows the determination 
of all period six orbits from 1000 iterates of the Hcnon 
map, contaminated by 10% Gaussian white noise. 



D. Locally linear prediction 

If there is a good reason to assume that the relation 
s n +i = f(s n ) is fulfilled by the experimental data in good 
approximation (say, within 5%) for some unknown / and 
that / is smooth, predictions can be improved by fitting 
local linear models. They can be considered as the lo- 
cal Taylor expansion of the unknown /, and are easily 
determined by minimizing 

with respect to a„ and b n , where U n is the e- 
neighborhood of s„, excluding s„ itself, as before. Then, 
the prediction is s„+i = a„s„ + b n . The minimiza- 
tion problem can be solved through a set of coupled lin- 
ear equations, a standard linear algebra problem. This 
scheme is implemented in onestep. For moderate noise 
levels and time series lengths this can give a reasonable 
improvement over zeroth and predict. Moreover, as 
discussed in Sec. VI, these linear maps are needed for the 
computation of the Lyapunov spectrum. Locally linear 
approximation was introduced in [45,46]. We should note 
that the straight least squares solution of Eq.(ll) is not 
always optimal and a number of strategies are available to 
regularize the problem if the matrix becomes nearly sin- 
gular and to remove the bias due to the errors in the "in- 
dependent" variables. These strategies have in common 
that any possible improvement is bought with consider- 
able complication of the procedure, requiring subtle pa- 
rameter adjustments. We refer the reader to Refs. [51,52] 
for advanced material. 

In Fig. 9 we show iterated predictions of the Poincare 
map data from the CO2 laser (Fig. 4) in a delay repre- 
sentation (using nstep in two dimensions). The resulting 
data do not only have the correct marginal distribution 
and power spectrum, but also form a perfect skeleton of 
the original noisy attractor. There are of course artefacts 
due to noise and the roughness of this approach, but there 
are good reasons to assume that the line- like substructure 
reflects fractality of the unperturbed system. 

Casdagli [53] suggested to use local linear models as a 
test for nonlinearity: He computed the average forecast 
error as a function of the neighborhood size on which the 
fit for a„ and b n is performed. If the optimum occurs 
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FIG. 9. Time delay representation of 5000 iterations of the 
local linear predictor nstep in two dimensions, starting from 
the last delay vector of Fig. 4. 

at large neighborhood sizes, the data are (in this embed- 
ding space) best described by a linear stochastic process, 
whereas an optimum at rather small sizes supports the 
idea of the existence of a nonlinear almost deterministic 
equation of motion. This protocol is implemented in the 
routine 11-ar, see Fig. 10. 
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FIG. 10. The Casdagli test for nonlinearity: The rms pre- 
diction error of local linear models as a function of the neigh- 
borhood size e. Lower curve: The CO2 laser data. These data 
are obviously highly deterministic in m=4 dimensions and 
with lag r=6. Central curve: The breath rate data shown in 
Fig. 12 with m=4 and t—1. Determinism is weaker (presum- 
ably due to a much higher noise level), but still the nonlinear 
structure is dominant. Upper curve: Numerically generated 
data of an AR(5) process, a linearly correlated random pro- 
cess (m=5, r=l). 



E. Global function fits 

The local linear fits are very flexible, but can go wrong 
on parts of the phase space where the points do not span 
the available space dimensions and where the inverse of 
the matrix involved in the solution of the minimization 
does not exist. Moreover, very often a large set of differ- 
ent linear maps is unsatisfying. Therefore many authors 
suggested to fit global nonlinear functions to the data, 
i.e. to solve 

a 2 = 5>„+i - / p (s„)) 2 , (12) 

n 

where f p is now a nonlinear function in closed form with 
parameters p, with respect to which the minimization 
is done. Polynomials, radial basis functions, neural nets, 
orthogonal polynomials, and many other approaches have 
been used for this purpose. The results depend on how 
far the chosen ansatz f p is suited to model the unknown 
nonlinear function, and on how well the data are de- 
terministic at all. We included the routines rbf and 
polynom in the TISEAN package, where f p is modeled 
by radial basis functions [54,55] and polynomials [56], 
respectively. The advantage of these two models is that 
the parameters p occur linearly in the function / and can 
thus be determined by simple linear algebra, and the so- 
lution is unique. Both features are lost for models where 
the parameters enter nonlinearly. 

In order to make global nonlinear predictions, one has 
to supply the embedding dimension and time delay as 
usual. Further, for polynom the order of the polynomial 



has to be given. The program returns the coefficients of 
the model. In rbf one has to specify the number of basis 
functions to be distributed on the data. The width of 
the radial basis functions (Lorentzians in our program) 
is another parameter, but since the minimization is so 
fast, the program runs many trial values and returns pa- 
rameters for the best. Figure 11 shows the result of a 
fit to the CO2 laser time series (Fig. 4) with radial basis 
functions. 

If global models are desired in order to infer the struc- 
ture and properties of the underlying system, they should 
be tested by iterating them. The prediction errors, al- 
though small in size, could be systematic and thus repel 
the iterated trajectory from the range where the original 
data are located. It can be useful to study a dependence 
of the size or the sign of the prediction errors on the 
position in the embedding space, since systematic errors 
can be reduced by a different model. Global models are 
attractive because they yield closed expressions for the 
full dynamics. One must not forget, however, that these 
models describe the observed process only in regions of 
the space which have been visited by the data. Outside 
this area, the shape of the model depends exclusively 
on the chosen ansatz. In particular, polynomials diverge 
outside the range of the data and hence can be unstable 
under iteration. 
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A. Simple nonlinear noise reduction 
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FIG. 11. Attractor obtained by iterating the model that 
has been obtained by a fit with 40 radial basis functions in 
two dimensions to the time series shown in Fig. 4. Compare 
also Fig. 9. 



V. NONLINEAR NOISE REDUCTION 



Filtering of signals from nonlinear systems requires the 
use of special methods since the usual spectral or other 
linear filters may interact unfavorably with the nonlin- 
ear structure. Irregular signals from nonlinear sources 
exhibit genuine broad band spectra and there is no jus- 
tification to identify any continuous component in the 
spectrum as noise. Nonlinear noise reduction does not 
rely on frequency information in order to define the dis- 
tinction between signal and noise. Instead, structure in 
the reconstructed phase space will be exploited. Gen- 
eral serial dependencies among the measurements {s n } 
will cause the delay vectors {s„} to fill the available Tri- 
dimensional embedding space in an inhomogeneous way. 
Linearly correlated Gaussian random variables will for 
example be distributed according to an anisotropic multi- 
variate Gaussian distribution. Linear geometric filtering 
in phase space seeks to identify the principal directions 
of this distribution and project onto them, see Sec. HE. 
Nonlinear noise reduction takes into account that nonlin- 
ear signals will form curved structures in delay space. In 
particular, noisy deterministic signals form smeared-out 
lower dimensional manifolds. Nonlinear phase space fil- 
tering seeks to identify such structures and project onto 
them in order to reduce noise. 

There is a rich literature on nonlinear noise reduction 
methods. Two articles of review character are avail- 
able, one by Kostelich and Schreiber [57], and one by 
Davies [58]. We refer the reader to these articles for fur- 
ther references and for the discussion of approaches not 
described in the present article. Here we want to con- 
centrate on two approaches that represent the geometric 
structure in phase space by local approximation. The 
first and simplest does so to constant order, the more 
sophisticated uses local linear subspaces plus curvature 
corrections. 



The simplest nonlinear noise reduction algorithm we 
know of replaces the central coordinate of each embed- 
ding vector by the local average of this coordinate. This 
amounts to a locally constant approximation of the dy- 
namics and is based on the assumption that the dynam- 
ics is continuous. The algorithm is described in [59], a 
similar approach is proposed in [43]. In an unstable, for 
example chaotic, system, it is essential not to replace the 
first and last coordinates of the embedding vectors by lo- 
cal averages. Due to the instability, initial errors in these 
coordinates are magnified instead of being averaged out. 

This noise reduction scheme is implemented quite eas- 
ily. First an embedding has to be chosen. Except for ex- 
tremely ovcrsampled data, it is advantageous to choose 
a short time delay. The program lazy always uses unit 
delay. The embedding dimension m should be chosen 
somewhat higher than that required by the embedding 
theorems. Then for each embedding vector {s„}, a neigh- 
borhood is formed in phase space containing all 
points {s„/} such that ||s„ — s n /|| < e. The radius of 
the neighborhoods e should be taken large enough in or- 
der to cover the noise extent, but still smaller than a 
typical curvature radius. These conditions cannot al- 
ways be fulfilled simultaneously, in which case one has 
to repeat the process with several choices and carefully 
evaluate the results. If the noise level is substantially 
smaller than the typical radius of curvature, neighbor- 
hoods of radius about 2-3 times the noise level gave the 
best results with artificial data. For each embedding vec- 
tor s„ = (s„_( m _i), . . . , s„) (the delay time has been set 
to unity), a corrected middle coordinate s„- m /2 is com- 

(n) 

putcd by averaging over the neighborhood LQ : 

Sn-m/2 = („) ^2 S n'-rn/2 • (13) 

After one complete sweep through the time series, all 
measurements s n are replaced by the corrected values 
s n . Of course, for the first and last (m — l)/2 points 
(if m is odd), no correction is available. The average 
correction can be taken as a new neighborhood radius for 
the next iteration. Note that the neighborhood of each 
point at least contains the point itself. If that is the only 
member, the average Eq.(13) is simply the uncorrected 
measurement and no change is made. Thus one can safely 
perform multiple iterations with decreasing values of e 
until no further change is made. 

Let us illustrate the use of this scheme with an ex- 
ample, a recording of the air flow through the nose of a 
human as an indicator of breath activity. (The data is 
part of data set B of the Santa Fe time series contest held 
in 1991/92 [17], see Rigney et al. [60] for a description.) 
The result of simple nonlinear noise reduction is shown 
in Fig. 12. 
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FIG. 12. Simple nonlinear noise reduction of human breath 
rate data. Three iterations have been carried out, staring with 
neighborhoods of size 0.4 units. Embeddings in 7 dimensions 
at unit delay have been used. Arguably, the resulting series 
(lower panel) is less noisy. However, in Sec. VIII we will show 
evidence that the noise is not just additive and independent 
of the signal. 



B. Locally projective nonlinear noise reduction 

A more sophisticated method makes use of the hy- 
potheses that the measured data is composed of the out- 
put of a low-dimensional dynamical system and of ran- 
dom or high-dimensional noise. This means that in an 
arbitrarily high-dimensional embedding space the deter- 
ministic part of the data would lie on a low-dimensional 
manifold, while the effect of the noise is to spread the 
data off this manifold. If we suppose that the amplitude 
of the noise is sufficiently small, we can expect to find 
the data distributed closely around this manifold. The 
idea of the projective nonlinear noise reduction scheme 
is to identify the manifold and to project the data onto 
it. The strategies described here go back to Ref. [61]. A 
realistic case study is detailed in Ref. [62]. 

Suppose the dynamical system, Eq. (1) or Eq. (2), form 
a g-dimcnsional manifold M. containing the trajectory. 
According to the embedding theorems, there exists a one- 
to-one image of the attractor in the embedding space, if 



the embedding dimension is sufficiently high. Thus, if 
the measured time series were not corrupted with noise, 
all the embedding vectors s„ would lie inside another 
manifold M in the embedding space. Due to the noise 
this condition is no longer fulfilled. The idea of the locally 
projective noise reduction scheme is that for each s n there 
exists a correction 0„, with ||0„|| small, in such a way 
that s„ — 0„ G M. and that n is orthogonal on M. 
Of course a projection to the manifold can only be a 
reasonable concept if the vectors are embedded in spaces 
which are higher dimensional than the manifold M.. Thus 
we have to over-embed in m-dimensional spaces with m > 

q- 

The notion of orthogonality depends on the metric 
used. Intuitively one would think of using the Euclidean 
metric. But this is not necessarily the best choice. The 
reason is that we are working with delay vectors which 
contain temporal information. Thus even if the middle 
parts of two delay vectors are close, the late parts could 
be far away from each other due to the influence of the 
positive Lyapunov exponents, while the first parts could 
diverge due the negative ones. Hence it is usually de- 
sirable to correct only the center part of delay vectors 
and leave the outer parts mostly unchanged, since their 
divergence is not only a consequence of the noise, but 
also of the dynamics itself. It turns out that for most ap- 
plications it is sufficient to fix just the first and the last 
component of the delay vectors and correct the rest. This 
can be expressed in terms of a metric tensor P which we 
define to be [61] 



>..-/! : * 
v ~ \ : e 



j and 1 < i, j < m 
elsewhere 



(14) 



where m is the dimension of the "over-embedded" delay 
vectors. 

Thus we have to solve the minimization problem 



(©iP -1 ©,) = min 



(15) 



with the constraints 



aj, (s„ - 0„) +bi = 



and 



for i = q + 1, . . . , m (16) 



(17) 



where the a l n are the normal vectors of M. at the point 

/ 1 • 

This ideas are realized in the programs ghkss, 
project, and noise in TISEAN. While the first two work 
as a posteriori filters on complete data sets, the last one 
can be used in a data stream. This means that it is 
possible to do the corrections online, while the data is 
coming in (for more details see section VC). All three 
algorithms mentioned above correct for curvature effects. 
This is done by either post-processing the corrections for 
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the delay vectors (ghkss) or by preprocessing the centres 
of mass of the local neighborhoods (project). 

The idea used in the ghkss program is the following. 
Suppose the manifold were strictly linear. Then, pro- 
vided the noise is white, the corrections in the vicinity 
of a point on the manifold would point in all directions 
with the same probability. Thus, if we added all the cor- 
rections we expect them to sum to zero (or (0) = O). 
On the other hand, if the manifold is curved, we ex- 
pect that there is a trend towards the centre of curva- 
ture ((0) = av ). Thus, to correct for this trend each 
correction is replaced by — av . 

A different strategy is used in the program project. 
The projections are done in a local coordinate system 
which is defined by the condition that the average of the 
vectors in the neighborhood is zero. Or, in other words, 
the origin of the coordinate systems is the centre of mass 
(s n )u of the neighborhood U. This centre of mass has 
a bias towards the centre of the curvature [2] . Hence, a 
projection would not lie on the tangent at the manifold, 
but on a secant. Now we can compute the centre of mass 
of these points in the neighborhood of s„. Let us call 
it {(s n ))u. Under fairly mild assumptions this point has 
twice the distance from the manifold then (s n )u- To cor- 
rect for the bias the origin of the local coordinate system 
is set to the point: ({s n ))u - 2(s n )u- 

The implementation and use of locally projective noise 
reduction as realized in project and ghkss is described 
in detail in Refs. [61,62]. Let us recall here the most 
important parameters that have to be set individually 
for each time series. The embedding parameters arc usu- 
ally chosen quite differently from other applications since 
considerable over-embedding may lead to better noise av- 
eraging. Thus, the delay time is preferably set to unity 
and the embedding dimension is chosen to provide em- 
bedding windows of reasonable lengths. Only for highly 
oversampled data (like the magneto-cardiogram, Fig. 15, 
at about 1000 samples per cycle), larger delays are nec- 
essary so that a substantial fraction of a cycle can be 
covered without the need to work in prohibitively high 
dimensional spaces. Next, one has to decide how many 
dimensions q to leave for the manifold supposedly con- 
taining the attractor. The answer partly depends on the 
purpose of the experiment. Rather brisk projections can 
be optimal in the sense of lowest residual deviation from 
the true signal. Low rms error can however coexist with 
systematic distortions of the attractor structure. Thus 
for a subsequent dimension calculation, a more conserva- 
tive choice would be in order. Remember however that 
points arc only moved towards the local linear subspace 
and too low a value of q does not do as much harm as 
may be though. 

The noise amplitude to be removed can be selected to 
some degree by the choice of the neighborhood size. In 
fact, nonlinear projective filtering can be seen indepen- 
dently of the dynamical systems background as filtering 
by amplitude rather than by frequency or shape. To al- 
low for a clear separation of noise and signal directions 
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FIG. 13. Two-dimensional representation of the NMR 
Laser data (top) and the result of the ghkss algorithm (bot- 
tom) after three iterations. 

locally, neighborhoods should be at least as large as the 
supposed noise level, rather larger. This of course com- 
petes with curvature effects. For small initial noise levels, 
it is recommended to also specify a minimal number of 
neighbors in order to permit stable linearizations. Fi- 
nally, we should remark that in successful cases most 
of the filtering is done within the first one to three it- 
erations. Going further is potentially dangerous since 
further corrections may lead mainly to distortion. One 
should watch the rms correction in each iteration and 
stop as soon as it doesn't decrease substantially any 
more. 

As an example for nonlinear noise reduction we treat 
the data obtained from an NMR laser experiment [63]. 
Enlargements of two-dimensional delay representations 
of the data are shown in Fig. 13. The upper panel shows 
the raw experimental data which contains about 1.1% of 
noise. The lower panel was produced by applying three 
iterations of the noise reduction scheme. The embedding 
dimension was m — 7, the vectors were projected down 
to two dimensions. The size of the local neighborhoods 
were chosen such that at least 50 neighbors were found. 
One clearly sees that the fractal structure of the attractor 
is resolved fairly well. 
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FIG. 14. Two-dimensional representation of a pure Gaus- 
sian process (top) and the outcome of the ghkss algorithm 
(bottom) after 10 iterations. Projections from m = 7 down 
to two dimensions were performed. 

The main assumption for this algorithm to work is that 
the data is well approximated by a low-dimensional mani- 
fold. If this is not the case it is unpredictable what results 
are created by the algorithm. In the absence of a real 
manifold, the algorithm must pick statistical fluctuations 
and spuriously interprets them as structure. Figure 14 
shows a result of the ghkss program for pure Gaussian 
noise. The upper panel shows a delay representation of 
the original data, the lower shows the outcome of apply- 
ing the algorithm for 10 iterations. The structure created 
is purely artifical and has nothing to do with structures 
in the original data. This means that if one wants to 
apply one of the algorithms, one has to carefully study 
the results. If the assumptions underlying the algorithms 
are not fulfilled in principle anything can happen. One 
should note however, that the performance of the pro- 
gram itself indicates such spurious behavior. For data 
which is indeed well approximated by a lower dimensional 
manifold, the average corrections applied should rapidly 
decrease with each successful iteration. This was the case 
with the NMR laser data and in fact, the correction was 
so small after three iteration that we stopped the proce- 
dure. For the white noise data, the correction only de- 
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FIG. 15. Real time nonlinear projective filtering of a mag- 
neto-cardiogram time series. The top panel shows the unfil- 
tered data. Bottom: Two iterations were done using projec- 
tions from m — 10 down to q — 2 dimensions (delay 0.01 s). 
Neighborhoods were limited to a radius of 0.1 units (0.05 in 
the second iteration) and to maximally 200 points. Neighbors 
were only sought up to 5 s back in time. Thus the first 5 s of 
data are not filtered optimally and are not shown here. Since 
the output of each iteration leaps behind its input by one de- 
lay window the last 0.2 s cannot be processed given the data 
in the upper panel. 

creased at a rate that corresponds to a general shrinking 
of the point set, indicating a lack of convergence towards 
a genuine low dimensional manifold. Below, we will give 
an example where an approximating manifold is present 
without pure determinism. In that case, projecting onto 
the manifold does reduce noise in a reasonable way. See 
Ref. [64] for material on the dangers of geometric filter- 
ing. 



C. Nonlinear noise reduction in a data stream 

In Ref. [65], a number of modifications of the above 
procedure have been discussed which enable the use of 
nonlinear projective filtering in a data stream. In this 
case, only points in the past are available for the for- 
mation of neighborhoods. Therefore the neighbor search 
strategy has to be modified. Since the algorithm is de- 
scribed in detail in Ref. [65], we only give an example 
of its use here. Figure 15 shows the result of nonlin- 
ear noise reduction on a magneto-cardiogram (see Figs. 1 
and 3) with the program noise. The same program has 
also been used successfully for the extraction of the fetal 
ECG [66]. 
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VI. LYAPUNOV EXPONENTS 

Chaos arises from the exponential growth of infinites- 
imal perturbations, together with global folding mecha- 
nisms to guarantee boundedncss of the solutions. This 
exponential instability is characterized by the spectrum 
of Lyapunov exponents [67]. If one assumes a local de- 
composition of the phase space into directions with dif- 
ferent stretching or contraction rates, then the spectrum 
of exponents is the proper average of these local rates 
over the whole invariant set, and thus consists of as many 
exponents as there are space directions. The most promi- 
nent problem in time series analysis is that the physical 
phase space is unknown, and that instead the spectrum is 
computed in some embedding space. Thus the number of 
exponents depends on the reconstruction, and might be 
larger than in the physical phase space. Such additional 
exponents are called spurious, and there are several sug- 
gestions to either avoid them [68] or to identify them. 
Moreover, it is plausible that only as many exponents 
can be determined from a time series as are entering the 
Kaplan Yorke formula (see below). To give a simple ex- 
ample: Consider motion of a high-dimensional system 
on a stable limit cycle. The data cannot contain any 
information about the stability of this orbit against per- 
turbations, as long as they are exactly on the limit cycle. 
For transients, the situation can be different, but then 
data are not distributed according to an invariant mea- 
sure and the numerical values are thus difficult to inter- 
pret. Apart from these difficulties, there is one relevant 
positive feature: Lyapunov exponents are invariant un- 
der smooth transformations and are thus independent of 
the measurement function or the embedding procedure. 
They carry a dimension of an inverse time and have to 
be normalized to the sampling interval. 
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FIG. 16. Estimating the maximal Lyapunov exponent of 
the CO2 laser data. The top panel shows results for the 
Poincare map data, where the average time interval T av 
is 52.2 samples of the flow, and the straight line indicates 
A = 0.38. For comparison: The iteration of the radial ba- 
sis function model of Fig. 11 yields A=0.35. Bottom panel: 
Lyapunov exponents determined directly from the flow data. 
The straight line has slope A = 0.007. In good approxima- 
tion, A map = Afl ow T a v Here, the time window w to suppress 
correlated neighbors has been set to 1000, and the delay time 
was 6 units. 



A. The maximal exponent 

The maximal Lyapunov exponent can be determined 
without the explicit construction of a model for the time 
series. A reliable characterization requires that the inde- 
pendence of embedding parameters and the exponential 
law for the growth of distances are checked [69,70] ex- 
plicitly. Consider the representation of the time series 
data as a trajectory in the embedding space, and assume 
that you observe a very close return s„/ to a previously 
visited point s„. Then one can consider the distance 
Ao = s„ — s„/ as a small perturbation, which should 
grow exponentially in time. Its future can be read from 
the time series: A; = s„ + ; — s„/ + ;. If one finds that 
A; I « Aoe A ' then A is (with probability one) the maxi- 
mal Lyapunov exponent. In practice, there will be fluc- 
tuations because of many effects, which are discussed in 
detail in [69]. Based on this understanding, one can de- 
rive a robust consistent and unbiased estimator for the 
maximal Lyapunov exponent. One computes 



S(e,m,t) = (in £ \s n+t - s n , +t \jj . (18) 

If S(e, to, t) exhibits a linear increase with identical slope 
for all to larger than some m and for a reasonable range 
of e, then this slope can be taken as an estimate of the 
maximal exponent Ai. 

The formula is implemented in the routines lyap_k 
and lyapunov in a straightforward way. (The program 
lyap_r implements the very similar algorithm of Ref. [70] 
where only the closest neighbor is followed for each ref- 
erence point. Also, the Euclidean norm is used.) Apart 
from parameters characterizing the embedding, the ini- 
tial neighborhood size e is of relevance: The smaller e, 
the large the linear range of S, if there is one. Obviously, 
noise and the finite number of data points limit e from 
below. The default values of lyap Jc are rather reasonable 
for map-like data. It is not always necessary to extend 
the average in Eq.(18) over the whole available data, rea- 
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sonable averages can be obtained already with a few hun- 
dred reference points s„ . If some of the reference points 
have very few neighbors, the corresponding inner sum 
in Eq.(18) is dominated by fluctuations. Therefore one 
may choose to exclude those reference points which have 
less than, say, ten neighbors. However, discretion has to 
be applied with this parameter since it may introduce a 
bias against sparsely populated regions. This could in 
theory affect the estimated exponents due to multifrac- 
tality. Like other quantities, Lyapunov estimates may be 
affected by serial correlations between reference points 
and neighbors. Therefore, a minimum time for \n — n'\ 
can and should be specified here as well. See also Sec. VII. 

Let us discuss a few typical outcomes. The data un- 
derlying the top panel of Fig. 16 are the values of the 
maxima of the CO2 laser data. Since this laser exhibits 
low dimensional chaos with a reasonable noise level, we 
observe a clear linear increase in this semi-logarithmic 
plot, reflecting the exponential divergence of nearby tra- 
jectories. The exponent is A « 0.38 per iteration (map 
data!), or, when introducing the average time interval, 
0.007 per fis. In the bottom panel we show the result for 
the same system, but now computed on the original flow- 
like data with a sampling rate of 1 MHz. As additional 
structure, an initial steep increase and regular oscillations 
are visible. The initial increase is due to non-normality 
and effects of alignment of distances towards the locally 
most unstable direction, and the oscillations are an effect 
of the locally different velocities and thus different den- 
sities. Both effects can be much more dramatic in less 
favorable cases, but as long as the regular oscillations 
possess a linearly increasing average, this can be taken 
as the estimate of the Lyapunov exponent. Normaliz- 
ing by the sampling rate, we again find A « 0.007 per 
/«s, but it is obvious that the linearity is less pronounced 
then for the map-like data. Finally, we show in Fig. 17 
an example of a negative result: We study the human 
breath rate data used before. No linear part exists, and 
one cannot draw any reasonable conclusion. It is worth 
considering the figure on a doubly logarithmic scale in 
order to detect a power law behavior, which, with power 
1/2, could be present for a diffusive growth of distances. 
In this particular example, there is no convincing power 
law either. 



B. The Lyapunov spectrum 

The computation of the full Lyapunov spectrum re- 
quires considerably more effort than just the maximal 
exponent. An essential ingredient is some estimate of 
the local Jacobians, i.e. of the linearized dynamics, which 
rules the growth of infinitesimal perturbations. One ei- 
ther finds it from direct fits of local linear models of the 
type s„+i = a„s„+6„, such that the first row of the Jaco- 
bian is the vector a„, and (J)ij = for i = 2, . . . , m, 

where m is the embedding dimension. The a„ is given by 
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FIG. 17. The breath rate data (c.f. Fig. 12) exhibit no 
linear increase, reflecting the lack of exponential divergence 
of nearby trajectories. 

the least squares minimization a 2 = J2i( s i+i~ a n s * — b n ) 2 
where {s;} is the set of neighbors of s„ [45,71]. Or one 
constructs a global nonlinear model and computes its lo- 
cal Jacobians by taking derivatives. In both cases, one 
multiplies the Jacobians one by one, following the tra- 
jectory, to as many different vectors in tangent space 
as one wants to compute Lyapunov exponents. Every 
few steps, one applies a Gram-Schmidt orthonormaliza- 
tion procedure to the set of u^, and accumulates the log- 
arithms of their rescaling factors. Their average, in the 
order of the Gram-Schmidt procedure, give the Lyapunov 
exponents in descending order. The routine lyap_spec 
uses this method, which goes back to [71] and [45], em- 
ploying local linear fits. Apart from the problem of spuri- 
ous exponents, this method contains some other pitfalls: 
It assumes that there exist well defined Jacobians, and 
does not test for their relevance. In particular, when at- 
tractors are thin in the embedding space, some (or all) of 
the local Jacobians might be estimated very badly. Then 
the whole product can suffer from these bad estimates 
and the exponents are correspondingly wrong. Thus the 
global nonlinear approach can be superior, if a modeling 
has been successful, see Sec. IV. 

In Table I we show the exponents of the stroboscopic 
NMR laser data in a three dimensional embedding as a 
function of the neighborhood size. Using global nonlin- 
ear models, we find the numbers given in the last two 
rows. More material is discussed in [2]. The spread of 
values in the table for this rather clean data set reflects 
the difficulty of estimating Lyapunov spectra from time 
series, which has to be done with great care. In partic- 
ular, when the algorithm is blindly applied to data from 
a random process, it cannot internally check for the con- 
sistency of the assumption of an underlying dynamical 
system. Therefore a Lyapunov spectrum is computed 
which now is completely meaningless. 

The computation of the first part of the Lyapunov 
spectrum allows for some interesting cross-checks. It was 
conjectured [72], and is found to be correct in most phys- 
ical situations, that the Lyapunov spectrum and the frac- 
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method 




Ai 


A 2 


A 3 


local linear 


k=20 


0.32 


-0.40 


-1.13 




fc=40 


0.30 


-0.51 


-1.21 




fc=160 


0.28 


-0.68 


-1.31 


radial basis functions 


0.27 


-0.64 


-1.31 


polynomial 




0.27 


-0.64 


-1.15 



TABLE I. Lyapunov exponents of the NMR laser data, 
determined with a three-dimensional embedding. The algo- 
rithms described in Sec. VIA give Ai = 0.3 ± 0.02 for the 
largest exponent. 



tal dimension of an attractor are closely related. If the 
expanding and least contracting directions in space are 
continuously filled and only one partial dimension is frac- 
tal, then one can ask for the dimensionality of a (fractal) 
volume such that it is invariant, i.e. such that the sum of 
the corresponding Lyapunov exponents vanishes, where 
the last one is weighted with the non-integer part of the 
dimension: 

D KY = k + ^p!^ > ( 19 ) 
|Afc+l| 

where k is the maximum integer such that the sum of 
the k largest exponents is still non-negative. D KY is 
conjectured to coincide with the information dimension. 

The Pesin identity is valid under the same assumptions 
and allows to compute the KS-cntropy: 

m 

h KS = J2 Q ( X i) X i- ( 2 °) 
i=l 



VII. DIMENSIONS AND ENTROPIES 

Solutions of dissipative dynamical systems cannot fill 
a volume of the phase space, since dissipation is syn- 
onymous with a contraction of volume elements under 
the action of the equations of motion. Instead, trajec- 
tories are confined to lower dimensional subsets which 
have measure zero in the phase space. These subsets can 
be extremely complicated, and frequently they possess a 
fractal structure, which means that they are in a nontriv- 
ial way self-similar. Generalized dimensions are one class 
of quantities to characterize this fractality. The Haus- 
dorff dimension is, from the mathematical point of view, 
the most natural concept to characterize fractal sets [67], 
whereas the information dimension takes into account 
the relative visitation frequencies and is therefore more 
attractive for physical systems. Finally, for the charac- 
terization of measured data, other similar concepts, like 
the correlation dimension, are more useful. One gen- 
eral remark is highly relevant in order to understand the 
limitations of any numerical approach: dimensions char- 
acterize a set or an invariant measure whose support is 



the set, whereas any data set contains only a finite num- 
ber of points representing the set or the measure. By 
definition, the dimension of a finite set of points is zero. 
When we determine the dimension of an attractor nu- 
merically, we extrapolate from finite length scales, where 
the statistics we apply is insensitive to the finiteness of 
the number of data, to the infinitesimal scales, where the 
concept of dimensions is defined. This extrapolation can 
fail for many reasons which will be partly discussed be- 
low. Dimensions are invariant under smooth transforma- 
tions and thus again computable in time delay embedding 
spaces. 

Entropies are an information theoretical concept to 
characterize the amount of information needed to pre- 
dict the next measurement with a certain precision. The 
most popular one is the Kolmogorov-Sinai entropy. We 
will discuss here only the correlation entropy, which can 
be computed in a much more robust way. The occur- 
rence of entropies in a section on dimensions has to do 
with the fact that they can be determined both by the 
same statistical tool. 



A. Correlation dimension 

Roughly speaking, the idea behind certain quantifiers 
of dimensions is that the weight p(e) of a typical e-ball 
covering part of the invariant set scales with its diameter 
like p(e) w e D , where the value for D depends also on 
the precise way one defines the weight. Using the square 
of the probability pi to find a point of the set inside the 
ball, the dimension is called the correlation dimension 
Z?2 , which is computed most efficiently by the correlation 
sum [73]: 

1 N 

C(m,e) = -—J2 E e(e-| 8j --s fc |), (21) 

r j—m k<j—w 

where s, are m-dimensional delay vectors, N pa i rs = (N — 
m + 1)(N — to — w + l)/2 the number of pairs of points 
covered by the sums, is the Heaviside step function and 
w will be discussed below. On sufficiently small length 
scales and when the embedding dimension m exceeds the 
box-dimension of the attractor [74] , 

C(m, e) oc e° 2 , (22) 

Since one does not know the box-dimension a priori, one 
checks for convergence of the estimated values of D 2 in 

TO. 

The literature on the correct and spurious estimation 
of the correlation dimension is huge and this is certainly 
not the place to repeat all the arguments. The relevant 
caveats and misconceptions are reviewed for example in 
Refs. [75,11,76,2]. The most prominent precaution is to 
exclude temporally correlated points from the pair count- 
ing by the so called Theiler window w [75]. In order to 
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become a consistent estimator of the correlation integral 
(from which the dimension is derived) the correlation sum 
should cover a random sample of points drawn indepen- 
dently according to the invariant measure on the attrac- 
tor. Successive elements of a time series are not usually 
independent. In particular for highly sampled flow data 
subsequent delay vectors are highly correlated. Theiler 
suggested to remove this spurious effect by simply ig- 
noring all pairs of points in Eq.(21) whose time indices 
differ by less than w, where w should be chosen gener- 
ously. With 0(N 2 ) pairs available, the loss of 0(wN) 
pairs is not dramatic as long as w <C N. At the very 
least, pairs with j = k have to be excluded [77], since 
otherwise the strong bias to D2 = 0, the mathematically 
correct value for a finite set of points, reduces the scaling 
range drastically. Choosing w, the first zero of the auto- 
correlation function, sometimes even the decay time of 
the autocorrelation function, are not large enough since 
they reflect only overall linear correlations [75,76]. The 
space-time-separation plot (Sec. IIIB) provides a good 
means of determining a sufficient value for w, as discussed 
for example in [41,2]. In some cases, notably processes 
with inverse power law spectra, inspection requires w to 
be of the order of the length of the time series. This 
indicates that the data does not sample an invariant at- 
tractor sufficiently and the estimation of invariants like 
Z?2 or Lyapunov exponents should be abandoned. 

Parameters in the routines d2, c2, and c2naive are as 
usual the embedding parameters m and r, the time de- 
lay, and the embedding dimension, as well as the Theiler 
window. 

Fast implementation of the correlation sum have been 
proposed by several authors. At small length scales, the 
computation of pairs can be done in 0(N log N) or even 
O(N) time rather than 0(N 2 ) without loosing any of 
the precious pairs, see Ref. [20]. However, for interme- 
diate size data sets we also need the correlation sum at 
intermediate length scales where neighbor searching be- 
comes expensive. Many authors have tried to limit the 
use of computational resources by restricting one of the 
sums in Eq.(21) to a fraction of the available points. By 
this practice, however, one looses valuable statistics at 
the small length scales where points are so scarce any- 
way that all pairs are needed for stable results. In [62], 
buth approaches were combined for the first time by us- 
ing fast neighbor search for e < eo and restricting the 
sum for e > eo- The TISEAN implementations c2 and 
d2 go one step further and select the range for the sums 
individually for each length scale to be processed. This 
turns out to give a major improvement in speed. The 
user can specify a desired number of pairs which seems 
large enough for a stable estimation of C(e), typically 
1000 pairs will suffice. Then the sums are extended to 
a range which guarantees that number of pairs, or, if 
this cannot be achieved, to the whole time series. At 
the largest length scales, this range may be rather small 
and the user may choose to give a minimal number of 
reference points to ensure a representative average. Still, 



using the program c2 the whole computation may thus at 
large scales be concentrated on the first part of the time 
series, which seems fair for stationary, non-intermittent 
data (nonstationary or strongly intermittent data is usu- 
ally unsuitable for correlation dimension estimation any- 
way). The program d2 is safer with this respect. Rather 
than restricting the range of the sums, only a randomly 
selected subset is used. The randomization however re- 
quires a more sophisticated program structure in order 
to avoid an overhead in computation time. 



1. Takens- Theiler estimator 

Convergence to a finite correlation dimension can be 
checked by plotting scale dependent "effective dimen- 
sions" versus length scale for various embeddings. The 
easiest way to proceed is to compute (numerically) the 
derivative of log C(m, e) with respect to log e, for exam- 
ple by fitting straight lines to the log-log plot of C(e). In 
Fig. 18 (a) we see the output of the routine c2 acting on 
data from the NMR laser, processed by c2d in order to 
obtain local slopes. By default, straight lines are fitted 
over one octave in e, larger ranges give smoother results. 
We can see that on the large scales, self-similarity is bro- 
ken due to the finite extension of the attractor, and on 
small but yet statistically significant scales we see the em- 
bedding dimension instead of a saturated, m-independent 
value. This is the effect of noise, which is infinite dimen- 
sional, and thus fills a volume in every embedding space. 
Only on the intermediate scales we see the desired plateau 
where the results are in good approximation independent 
of m and e. The region where scaling is established, not 
just the range selected for straight line fitting, is called 
the scaling range. 

Since the statistical fluctuations in plots like 
Fig. 18 (a) show characteristic (anti-)correlations, it has 
been suggested [78,79] to apply a maximum likelihood 
estimator to obtain optimal values for D2. The Takens- 
Theiler-estimator reads 

^> = jjfb (23) 

and can be obtained by processing the output of c2 
by c2t. Since C(e) is available only at discrete val- 
ues {ei,i = 0, . . . , /}, we interpolate it by a pure power 
law (or, equivalently, the log-log plot by straight lines: 
logC(e) = diloge + b{) in between these. The resulting 
integrals can be solved trivially and summed: 

JO 6 i=1 Jei-! 



i=i 



(24) 



Plotting Dtt versus e (Fig. 18 (b)) is an interesting al- 
ternative to the usual local slopes plot, Fig. 18 (a). It 
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is tempting to use such an "estimator of dimension" as 
a black box to provide a number one might quote as a 
dimension. This would imply the unjustified assumption 
that all deviations from exact scaling behavior is due to 
the statistical fluctuations. Instead, one still has to ver- 
ify the existence of a scaling regime. Only then, DTr(e) 
evaluated at the upper end of the scaling range is a rea- 
sonable dimension estimator. 



2. Gaussian kernel correlation integral 




The correlation sum Eq.(21) can be regarded as an av- 
erage density of points where the local density is obtained 
by a kernel estimator with a step kernel 0(e — r). A natu- 
ral modification for small point sets is to replace the sharp 
step kernel by a smooth kernel function of bandwidth e. 
A particularly attractive case that has been studied in 
the literature [80] is given by the Gaussian kernel, that 

is, 0(e — r) is replaced by . The resulting Gaussian 
kernel correlation sum Cc(e) has the same scaling prop- 
erties as the usual C(e). It has been observed in [3] that 
Cc(e) can be obtained from C(e) via 



C G (e) 



Wo 



de • 



(25) 



without having to repeat the whole computation. If C(e) 
is given at discrete values of e, the integrals in Eq.(25) 
can be carried out numerically by interpolating C (e) with 
pure power laws . This is done in c2g which uses a 15 
point Gauss-Kronrod rule for the numerical integration. 



B. Information dimension 





Another way of attaching weight to e-balls, which is 
more natural, is the probability p t itself. The result- 
ing scaling exponent is called the information dimension 
D\. Since the Kaplan- Yorke dimension of Sec. VI is an 
approximation of Di, the computation of D\ through 
scaling properties is a relevant cross-check for highly de- 
terministic data. D\ can be computed from a modified 
correlation sum, where, however, unpleasant systematic 
errors occur. The fixed mass approach [81] circumvents 
these problems, so that, including finite sample correc- 
tions [77], a rather robust estimator exists. Instead of 
counting the number of points in a ball one asks here for 
the diameter e which a ball must have to contain a cer- 
tain number k of points when a time series of length N 
is given. Its scaling with k and N yields the dimension 
in the limit of small length scales by 



D\{m) = lim 



dlogk/N 



k/N^o d{loge(k/N)) 



(26) 



The routine cl computes the (geometric) mean length 
scale exp(loge(fc/iV)) for which k neighbors are found in 
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FIG. 18. Dimension estimation for the (noise filtered) 
NMR laser data. Embedding dimensions 2 to 7 are shown. 
From above: (a) slopes are determined by straight line fits 
to the log- log plot of the correlation sum, Eq. (21). (b) 
Takes-Theiler estimator of the same slope, (c) Slopes are 
obtained by straight line fits to the Gaussian kernel correla- 
tion sum, Eq.(25). (d) Instead of the correlation dimension, 
it has been attempted to estimate the information dimension. 
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N data points, as a function of k/N. Unlike the corre- 
lation sum, finite sample corrections are necessary if k is 
small [77] . Essentially, the log of k has to be replaced by 
the digamma function ty(k). The resulting expression is 
implemented in cl. Given to and r, the routine varies k 
and N such that the largest reasonable range of k/N is 
covered with moderate computational effort. This means 
that for l/N < k/N < K/N (default: K = 100), all N 
available points are searched for neighbors and k is var- 
ied. For K/N < k/N < 1, k = K is kept fixed and N is 
decreased. The result for the NMR laser data is shown in 
Fig. 18 (d), where a nice scaling with D\ ss 1.35 can be 
discerned. For comparability, the logarithmic derivative 
of k/N is plotted versus exp(log e(k, N)} and not vice 
versa, although k/N is the independent variable. One 
easily detects again the violations of scaling discussed 
before: Cut-off on the large scales, noise on small scales, 
fluctuations on even smaller scales, and a scaling range 
in between. In this example, D x is close to D 2 , and mul- 
tifractality cannot be established positively. 



1. Entropy estimates 

The correlation dimension characterizes the e depen- 
dence of the correlation sum inside the scaling range. It is 
natural to ask what we can learn form its m-dependence, 
once to is larger than D . The number of e-ncighbors 
of a delay vector is an estimate of the local probability 
density, and in fact it is a kind of joint probability: All m- 
components of the neighbor have to be similar to those of 
the actual vector simultaneously. Thus when increasing 
to, joint probabilities covering larger time spans get in- 
volved. The scaling of these joint probabilities is related 
to the correlation entropy /12, such that 

C(m,e) » e D 'e- mh ' , (27) 

As for the scaling in e, also the dependence on to is valid 
only asymptotically for large to, which one will not reach 
due to the lack of data points. So one will study h^ifn) 
versus to and try to extrapolate to large to. The corre- 
lation entropy is a lower bound of the Kolmogorov Sinai 
entropy, which in turn can be estimated by the sum of 
the positive Lyapunov exponents. The program d2 pro- 
duces as output the estimates of h 2 directly, from the 
other correlation sum programs it has to be extracted by 
post-processing the output. 

The entropies of first and second order can be derived 
from the output of cl and c2 respectively. An alternate 
means of obtaining these and the other generalized en- 
tropies is by a box counting approach. Let pi be the 
probability to find the system state in box i, then the 
order q entropy is defined by the limit of small box size 
and large to of 

5>««e-™V (28) 



To evaluate J^iPi over a nne m esh of boxes in to ^> 1 
dimensions, economical use of memory is necessary: A 
simple histogram would take (l/e) m storage. Therefore 
the program boxcount implements the mesh of boxes as a 
tree with (l/e)-fold branching points. The tree is worked 
through recursively so that at each instance at most one 
complete branch exists in storage. The current version 
does not implement finite sample corrections to Eq.(28). 



VIII. TESTING FOR NONLINEARITY 

Most of the methods and quantities discussed so far are 
most appropriate in cases where the data show strong and 
consistent nonlinear deterministic signatures. As soon as 
more than a small or at most moderate amount of addi- 
tive noise is present, scaling behavior will be broken and 
predictability will be limited. Thus we have explored 
the opposite extreme, nonlinear and fully deterministic, 
rather than the classical linear stochastic processes. The 
bulk of real world time scries falls in neither of these lim- 
iting categories because they reflect nonlinear responses 
and effectively stochastic components at the same time. 
Little can be done for many of these cases with current 
methods. Often it will be advisable to take advantage of 
the well founded machinery of spectral methods and ven- 
ture into nonlinear territory only if encouraged by pos- 
itive evidence. This section is about methods to estab- 
lish statistical evidence for nonlinearity beyond a simple 
rescaling in a time series. 



A. The concept of surrogate data 

The degree of nonlinearity can be measured in sev- 
eral ways. But how much nonlinear predictability, say, 
is necessary to exclude more trivial explanations? All 
quantifiers of nonlinearity show fluctuations but the dis- 
tributions, or error bars if you wish, are not available 
analytically. It is therefore necessary to use Monte Carlo 
techniques to assess the significance of results. One im- 
portant method in this context is the method of surrogate 
data [82]. A null hypothesis is formulated, for example 
that the data has been created by a stationary Gaussian 
linear process, and then it is attempted to reject this 
hypothesis by comparing results for the data to appro- 
priate realizations of the null hypothesis. Since the null 
assumption is not a simple one but leaves room for free 
parameters, the Monte Carlo sample has to take these 
into account. One approach is to construct constrained 
realizations of the null hypothesis. The idea is that the 
free parameters left by the null are reflected by specific 
properties of the data. For example the unknown coeffi- 
cients of an autoregressive process are reflected in the au- 
tocorrelation function. Constrained realizations are ob- 
tained by randomizing the data subject to the constraint 
that an appropriate set of parameters remains fixed. For 
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FIG. 19. Upper: The human breath rate data from Fig. 12. 
Lower: the noise component extracted by the noise reduction 
scheme has been randomized in order to destroy correlations 
with the signal. The result appears slightly but significantly 
less structured than the original. 



example, random data with a given periodogram can be 
made by assuming random phases and taking the inverse 
Fourier transform of the given periodogram. Random 
data with the same distribution as a given data set can 
be generated by permuting the data randomly without 
replacement. Asking for a given spectrum and a given 
distribution at the same time poses already a much more 
difficult question. 



B. Iterative Fourier transform method 

Very few real time series which are suspected to show 
nonlincarity follow a Gaussian single time distribution. 
Non-Gaussianity is the simplest kind of nonlinear signa- 
ture but it may have a trivial reason: The data may have 
been distorted in the measurement process. Thus a pos- 
sible null hypothesis would be that there is a stationary 
Gaussian linear stochastic process that generates a se- 
quence {x n }, but the actual observations are s n — s(x n ) 
where s(-) is a monotonic function. Constrained realiza- 
tions of this null hypothesis would require the genera- 



tion of random sequences with the same power spectrum 
(fully specifying the linear process) and the same single 
time distribution (specifying the effect of the measure- 
ment function) as the observed data. The Amplitude 
Adjusted Fourier Transform (AAFT) method proposed 
in [82] attempts to invert the measurement function s(-) 
by rescaling the data to a Gaussian distribution. Then 
the Fourier phases are randomized and the rescaling is 
inverted. As discussed in [83], this procedure is biased 
towards a flatter spectrum since the inverse of s(-) is not 
available exactly. In the same reference, a scheme is in- 
troduced that removes this bias by iteratively adjusting 
the spectrum and the distribution of the surrogates. Al- 
ternatingly, the surrogates are rescaled to the exact val- 
ues taken by the data and then the Fourier transform is 
brought to the exact amplitudes obtained from the data. 
The discrepancy between both steps either converges to 
zero with the number of iterations or to a finite inaccu- 
racy which decreases with the length of the time series. 
The program surrogates performs iterations until no 
further improvement can be made. The last two stages 
are returned, one having the exact Fourier amplitudes 
and one taking on the same values as the data. For not 
too exotic data these two versions should be almost iden- 
tical. The relative discrepancy is also printed. 

In Fig. 19 we used this procedure to assess the hy- 
pothesis that the noise reduction on the breath data re- 
ported in Fig. 12 removed an additive noise component 
which was independent of the signal. If the hypothesis 
were true, we could equally well add back on the noise 
sequence or a randomized version of it which lacks any 
correlations to the signal. In the upper panel of Fig. 19 
we show the original data. In the lower panel we took the 
noise reduced version (c.f. Fig. 12, bottom) and added a 
surrogate of the supposed noise sequence. The result is 
similar but still significantly different from the original 
to make the additivity assumption implausible. 

Fourier based randomization schemes suffer from some 
caveats due to the the inherent assumption that the data 
constitutes one period of a periodic signal, which is not 
what we really expect. The possible artefacts are dis- 
cussed for example in [84] and can, in summary, lead to 
spurious rejection of the null hypothesis. One precaution 
that should be taken when using surrogates is to make 
sure that the beginning and the end of the data approx- 
imately match in value and phase. Then, the periodicity 
assumption is not too far wrong and harmless. Usually, 
this amounts to the loss of a few points of the series. One 
should note, however, that the routine may truncate the 
data by a few points itself in order to be able to perform 
a fast Fourier transform which requires the number of 
points to be factorizable by small prime factors. 
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FIG. 20. Upper trace: Data from a stationary Gaussian 
linear stochastic process (x n = 0.7x n -i + r/ n ) measured by 
s(x n ) = Xn- Samples 200-220 are an artefact. With the 
Fourier based scheme (middle trace) the artefact results in an 
increased number of spikes in the surrogates and reduced pre- 
dictability. In the lower trace, the artefact has been preserved 
along with the distribution of values and lags 1, . . . , 25 of the 
autocorrelation function. 



C. General constrained randomization 

In [85], a general method has been proposed to create 
random data which fulfill specified constraints. With this 
method, the artefacts and remaining imprecision of the 
Fourier based randomization schemes can be avoided by 
specifying the autocorrelation function rather than the 
Fourier transform. The former does not assume periodic 
continuation. Maybe more importantly, the restriction 
to a rather narrow null hypothesis can be relaxed since 
in principle arbitrary statistical observables can be im- 
posed on the surrogates. A desired property of the data 
has to be formulated in terms of a cost function which 
assumes an absolute minimum when the property is ful- 
filled. States arbitrarily close to this minimal cost can be 
reached by the method of simulated annealing. The cost 
function is minimised among all possible permutations of 
the data. See [85] for a description of the approach. 

The TISEAN package contains the building blocks 
for a library of surrogate data routines implementing 
user specified cost functions. Currently, only the au- 
tocorrelation function with and without periodic con- 
tinuation have been implemented. Further, a template 
is given from which the user may derive her/his own 
routines. A module is provided that drives the simu- 
lated annealing process through an exponential cooling 
scheme. The user may replace this module by other 
scheme of her/his choice. A module that performs ran- 
dom pair permutations is given which allows to exclude 
a list of points from the permutation scheme. More so- 
phisticated permutation schemes can be substituted if 
desired. Most importantly, the cost function has to be 
given as another module. The autocorrelation modules 
use max^ x |C(r) - C(r) data |/r, where C(t) is the au- 





FIG. 21. Randomization of 500 points generated by the the 
Henon map. (a) Original data; (b) Same autocorrelations 
and distribution; (c)-(f) Different stages of annealing with a 
cost function C involving three and four-point correlations, 
(c) A random shuffle, C = 2400; (d) C = 150; (e) C = 15; 
(f) C = 0.002. See text. 



tocorrelation function with or without periodic continu- 
ation. 

In Fig. 20 we show an example fulfilling the null hy- 
pothesis of a rescaled stationary Gaussian linear stochas- 
tic process which has been contaminated by an artefact 
at samples 200-220. The Fourier based schemes are un- 
able to implement the artefact part of the null hypoth- 
esis. They spread the structure given by the artefact 
evenly over the whole time span, resulting in more spikes 
and less predictability. In fact, the null hypothesis of 
a stationary rescaled Gaussian linear stochastic process 
can be rejected at the 95% level of significance using 
nonlinear prediction errors. The artefact would spuri- 
ously be mistaken for nonlinearity. With the program 
randomize_auto_expjrandom, we can exclude the arte- 
fact from the randomization scheme and obtain a correct 
test. 
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As an example of a more exotic cost function, let us 
show the randomization of 500 iterates of the Henon map, 
Fig. 21 (a). Panel (b) shows the output of surrogates 
having the same spectrum and distribution. Starting 
from a random permutation (c), the cost function 

C = (x n -ix n ) + (x n - 2 x n ) 

{ X n—l X 'n-} (•En—l% n ) "P (^n- 2 X n) "P (%n—2%n—l%n) 

+ (xl-ix 2 n ) + {x n -ixl) + (x 3 n _ lXn ) (29) 

is minimized (randomize^generic_exp_random). It in- 
volves are all the higher order autocorrelations which 
would be needed for a least squares fit with the ansatz 
x n = c — ax 2 n _ x + &x„_2 and in this sense fully spec- 
ifies the quadratic structure of the data. The random 
shuffle yields C = 2400, panels (c)-(f) correspond to 
C = 150, 15,0.002 respectively. 

Since the annealing process can be very CPU time con- 
suming, it is important to provide efficient code for the 
cost function. Specifying r max lags for N data points 
requires 0(-/Vr max ) multiplications for the calculation of 
the cost function. An update after a pair has been ex- 
changed, however, can be obtained with 0(T max ) mul- 
tiplications. Often, the full sum or supremum can be 
truncated since after the first terms it is clear that a large 
increase of the cost is unavoidable. The driving Metropo- 
lis algorithm provides the current maximal pcrmissable 
cost for that purpose. 

The computation time required to reach the desired ac- 
curacy depends on the choice and implementation of the 
cost function but also critically on the annealing sched- 
ule. There is a vast literature on simulated annealing 
which cannot be reviewed here. Experimentation with 
cooling schemes should keep in mind the basic concept 
of simulated annealing. At each stage, the system - 
here the surrogate to be created - is kept at a certain 
"temperature". Like in thermodynamics, the tempera- 
ture determines how likely fluctuations around the mean 
energy - here the value of the cost function C - are. At 
temperature T, a deviation of size AC occurs with the 
Boltzmann probability oc cxp(— AC/T). In a Metropo- 
lis simulation, this is achieved by accepting all downhill 
changes (AC < 0), but also uphill changes with proba- 
bility exp(— AC/T). Here the changes are permutations 
of two randomly selected data items. The present im- 
plementation offers an exponential cooling scheme, that 
is, the temperature is lowered by a fixed factor whenever 
one of two conditions is fulfilled: Either a specified num- 
ber of changes has been tried, or a specified number of 
changes has been accepted. Both these numbers and the 
cooling factor can be chosen by the user. If the state is 
cooled too fast it gets stuck, or "freezes" in a false mini- 
mum. When this happens, the system must be "melted" 
again and cooling is taken up at a slower rate. This can 
be done automatically until a goal accuracy is reached. 
It is, however, difficult to predict how many steps it will 
take. The detailed behavior of the scheme is still sub- 
ject to ongoing research and in all but the simplest cases, 



experimentation by the user will be necessary. To fa- 
cilitate the supervision of the cooling, the current state 
is written to a file whenever a substantial improvement 
has been made. Further, the verbosity of the diagnostic 
output can be selected. 



D. Measuring weak nonlinearity 

When testing for nonlinearity, we would like to use 
quantifiers which are optimized for the weak nonlinear- 
ity limit, which is not what most time series methods of 
chaos theory have been designed for. The simple nonlin- 
ear prediction scheme (Sec. IV B) has proven quite use- 
ful in this context. If used as a comparative statistic, it 
should be noted that sometimes seemingly inadequate 
embeddings or neighborhood sizes may lead to rather 
big errors which have, however, small fluctuations. The 
tradeoff between bias and variance may be different from 
the situation where predictions are desired per se. The 
same rationale applies to quantities derived from the cor- 
relation sum. Neither the small scale limit, genuine scal- 
ing, or the Theiler correction are formally necessary in 
a comparative test. However, any temptation to inter- 
pret the results in terms like "complexity" or "dimen- 
sionality" should be resisted, even though "complexity" 
doesn't seem to have an agreed-upon meaning anyway. 
Apart from average prdiction errors, we have found the 
stabilities of short periodic orbits (see Sec. IV C) useful 
for the detectionof nonlinearity in surrogate data tests. 
As an alternative to the phase space based methods, more 
traditional measures of nonlinearity derived from higher 
order autocorrelation functions ( [86], routine autocor3) 
may also be considered. If a time-reversal asymmetry is 
present, its statistical confirmation (routine timerev) is 
a very powerful detector of nonlinearity [87]. Some mea- 
sures of weak nonlinearity are compared systematically 
in Ref. [88]. 



IX. CONCLUSION AND PERSPECTIVES 

The TISEAN project makes available a number of al- 
gorithms of nonlinear time series analysis to people inter- 
ested in applications of the dynamical systems approach. 
To make proper use of these algorithms, it is not essen- 
tial to have witten the programs from scratch, an effort 
we intend to spare the user by making TISEAN public. 
Indispensable, however, is a good knowledge of what the 
programs do, and why they do what they do. The latter 
requires a thorough background in the nonlinear time se- 
ries approach which cannot be provided by this paper but 
rather by textbooks like Refs. [10,2], reviews [11,12,3], 
and the original literature [9]. Here, we have concen- 
trated on the actual implementation as it is realized in 
TISEAN and on examples of the concrete use of the pro- 
grams. 
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A. Important methods which are (still) missing 

Let us finish the discussion by giving some perspec- 
tives on future work. So far, the TISEAN project has 
concentrated on the most common situation of a single 
time series. While for multiple measurements of similar 
nature most programs can be modified with moderate ef- 
fort, a general framework for heterogeneous multivariate 
recordings (say, blood pressure and heart beat) has not 
been established so far in a nonlinear context. Never- 
theless, we feel that concepts like generalized synchrony, 
coherence, or information flow are well worth pursuing 
and at some point should become available to a wider 
community, including applied research. 

Initial experience with nonlinear time series methods 
indicates that some of the concepts may prove useful 
enough in the future to become part of the established 
time series tool box. For this to happen, availability of 
the algorithms and reliable information on their use will 
be essential. The publication of a substantial collection 
of research level programs through the TISEAN project 
may be seen as one step in that direction. However, the 
potential user will still need considerable experience in 
order to make the right decisions - about the suitability 
of a particular method for a specific time series, about the 
selection of parameters, about the interpretation of the 
results. To some extent, these decisions could be guided 
by software that evaluates the data situation and the re- 
sults automatically. Previous experience with black box 
dimension or Lyapunov estimators has not been encour- 
aging, but for some specific problems, "optimal" answers 
can in principle be defined and computed automatically, 
once the optimality criterion is formulated. For exam- 
ple, the prediction programs could be encapsulated in a 
framework that automatically evaluates the performance 
for a range of embedding parameters etc. Of course, 
quantitative assessment of the results is not always easy 
to implement and depends on the purpose of the study. 
As another example, it seems realistic to define "opti- 
mal" Poincare surfaces of section and to find the optimal 
solutions numerically. 

Like in most of the time series literature, the issue of 
stationarity has entered the discussion only as something 
the lack of which has to be detected in order to avoid 
spurious results. Taking this point seriously amounts to 
rejecting a substantial fraction of time series problems, 
including the most prominent examples, that is, most 
data from finance, metereology, and biology. It is quite 
clear that the mere rejection of these challenging prob- 
lems is not satisfactory and we will have to develop tools 
to actually analyse, understand, and predict nonstation- 
ary data. Some suggestions have been made for the de- 
tection of fluctuating control parameters [89-92] . Most of 
these can be seen as continuous versions of the classifica- 
tion problem, another application which is not properly 
represented in TISEAN yet. 

Publishing software, or reviews and textbooks for that 



matter, in a field evolving as rapidly as nonlinear time se- 
ries analysis will always have the character of a snapshot 
of the state at a given time. Having the options either 
to wait until the field has saturated sufficiently or to risk 
that programs, or statements made, will become obsolete 
soon, we chose the second option. We hope that we can 
thus contibutc to the further evolution of the field. 
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